Time-Frequency Trade-offs for Audio Source Separation with Binary Masks

نویسنده

  • Andrew J. R. Simpson
چکیده

The short-time Fourier transform (STFT) provides the foundation of binary-mask based audio source separation approaches. In computing a spectrogram, the STFT window size parameterizes the trade-off between time and frequency resolution. However, it is not yet known how this parameter affects the operation of the binary mask in terms of separation quality for real-world signals such as speech or music. Here, we demonstrate that the trade-off between time and frequency in the STFT, used to perform ideal binary mask separation, depends upon the types of source that are to be separated. In particular, we demonstrate that different window sizes are optimal for separating different combinations of speech and musical signals. Our findings have broad implications for machine audition and machine learning in general.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Music Remixing and Upmixing Using Source Separation

Current research on audio source separation provides tools to estimate the signals contributed by different instruments in polyphonic music mixtures. Such tools can be already incorporated in music production and post-production workflows. In this paper, we describe recent experiments where audio source separation is applied to remixing and upmixing existing mono and stereo music content. 1. AU...

متن کامل

Blind Source Separation Using Mixtures of Alpha-Stable Distributions

We propose a new blind source separation algorithm based on mixtures of alpha-stable distributions. Complex symmetric alpha-stable distributions have been recently showed to better model audio signals in the time-frequency domain than classical Gaussian distributions thanks to their larger dynamic range. However, inference of these models is notoriously hard to perform because their probability...

متن کامل

Informed algorithms for sound source separation in enclosed reverberant environments

While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are “informed” i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by ...

متن کامل

Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks

Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks ...

متن کامل

A Source Reassignment Technique for Time-frequency Masking Audio Separation

A neighborhood-based source reassignment technique is proposed for being used on time-frequency masking audio source separation methods. This technique identifies all the time-frequency clusters that form the separation masks in the Short-Time Fourier Transform (STFT) domain, and labels each time-frequency bin with a value that denotes the size of their corresponding clusters. The bins correspo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1504.07372  شماره 

صفحات  -

تاریخ انتشار 2015